cual-id: Globally Unique, Correctable, and Human-Friendly Sample Identifiers for Comparative Omics Studies

نویسندگان

  • John H. Chase
  • Evan Bolyen
  • Jai Ram Rideout
  • J. Gregory Caporaso
چکیده

The number of samples in high-throughput comparative "omics" studies is increasing rapidly due to declining experimental costs. To keep sample data and metadata manageable and to ensure the integrity of scientific results as the scale of these projects continues to increase, it is essential that we transition to better-designed sample identifiers. Ideally, sample identifiers should be globally unique across projects, project teams, and institutions; short (to facilitate manual transcription); correctable with respect to common types of transcription errors; opaque, meaning that they do not contain information about the samples; and compatible with existing standards. We present cual-id, a lightweight command line tool that creates, or mints, sample identifiers that meet these criteria without reliance on centralized infrastructure. cual-id allows users to assign universally unique identifiers, or UUIDs, that are globally unique to their samples. UUIDs are too long to be conveniently written on sampling materials, such as swabs or microcentrifuge tubes, however, so cual-id additionally generates human-friendly 4- to 12-character identifiers that map to their UUIDs and are unique within a project. By convention, we use "cual-id" to refer to the software, "CualID" to refer to the short, human-friendly identifiers, and "UUID" to refer to the globally unique identifiers. CualIDs are used by humans when they manually write or enter identifiers, while the longer UUIDs are used by computers to unambiguously reference a sample. Finally, cual-id optionally generates printable label sticker sheets containing Code 128 bar codes and CualIDs for labeling of sample collection and processing materials. IMPORTANCE The adoption of identifiers that are globally unique, correctable, and easily handwritten or manually entered into a computer will be a major step forward for sample tracking in comparative omics studies. As the fields transition to more-centralized sample management, for example, across labs within an institution, across projects funded under a common program, or in systems designed to facilitate meta- and/or integrated analysis, sample identifiers generated with cual-id will not need to change; thus, costly and error-prone updating of data and metadata identifiers will be avoided. Further, using cual-id will ensure that transcription errors in sample identifiers do not require the discarding of otherwise-useful samples that may have been expensive to obtain. Finally, cual-id is simple to install and use and is free for all use. No centralized infrastructure is required to ensure global uniqueness, so it is feasible for any lab to get started using these identifiers within their existing infrastructure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IDGenerator: unique identifier generator for epidemiologic or clinical studies

BACKGROUND Creating study identifiers and assigning them to study participants is an important feature in epidemiologic studies, ensuring the consistency and privacy of the study data. The numbering system for identifiers needs to be random within certain number constraints, to carry extensions coding for organizational information, or to contain multiple layers of numbers per participant to di...

متن کامل

Guidelines for Use of Extended Unique Identifier (EUI), Organizationally Unique Identifier (OUI), and Company ID (CID)

This tutorial covers organizational identifiers assigned by the IEEE Registration Authority (IEEE RA) and extended identifiers based on them. It covers identifier formats, assignment, guidelines, and policies relevant to assignees as well as to standards developers. The tutorial includes information relevant to organizational identifiers, such as Organizationally Unique Identifier (OUI) and Com...

متن کامل

Assignment of System Identifiers for TUBA/CLNP Hosts

This memo specifies methods for assigning a 6 octet system identifier portion of the OSI NSAP address formats described in "Guidelines for OSI NSAP Allocation in the Internet" [1], in a fashion that ensures that the ID is unique within a routing domain. It also recommends methods for assigning system identifiers having lengths other than 6 octets. The 6 octet system identifiers recommended in t...

متن کامل

WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit

Functional enrichment analysis has played a key role in the biological interpretation of high-throughput omics data. As a long-standing and widely used web application for functional enrichment analysis, WebGestalt has been constantly updated to satisfy the needs of biologists from different research areas. WebGestalt 2017 supports 12 organisms, 324 gene identifiers from various databases and t...

متن کامل

Compromising the Security of “Generating Unique identifiers from Patient Identification Data Using Security Models”

Sir, I write with respect to the Technical Note “Generating unique identifiers (IDs) from patient identification data using security models,”[1] the authors of which propose a method to “create a unique one‐way encrypted ID per patient that can be used for data sharing.” In summary, their method involves concatenation of a patient’s date of birth, sex, and surname, utilizing either the MD5 or S...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2016